Developing and improving a statistical machine translation system for English to Setswana: a linguistically-motivated approach

نویسندگان

  • Ilana Wilken
  • Marissa Griesel
  • Cindy McKellar
چکیده

This paper describes the methods that were followed in the development and improvement of a statistical machine translation system for translation from English to Setswana. Setswana is regarded as a resource scarce language and therefore an adequate amount of parallel data is not freely available. The methods created attempt to improve the quality of a machine translation by manipulating the data during processing. The methods include the creation of sentence reordering, term deletion and term replacement rules. The rules were applied to training and testing data in the preand postprocessing stages of development. The systems were compared to one another to detect whether the quality of the machine translation improved. Keywords—statistical machine translation, pre-processing, postprocessing, sentence reordering, English, Setswana, term replacement, term deletion

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Linguistically Annotated BTG for Statistical Machine Translation

Bracketing Transduction Grammar (BTG) is a natural choice for effective integration of desired linguistic knowledge into statistical machine translation (SMT). In this paper, we propose a Linguistically Annotated BTG (LABTG) for SMT. It conveys linguistic knowledge of source-side syntax structures to BTG hierarchical structures through linguistic annotation. From the linguistically annotated da...

متن کامل

TÜBİTAK-BİLGEM German-English Machine Translation Systems for W13

This paper describes TÜBİTAK-BİLGEM statistical machine translation (SMT) systems submitted to the Eighth Workshop on Statistical Machine Translation (WMT) shared translation task for German-English language pair in both directions. We implement phrase-based SMT systems with standard parameters. We present the results of using a big tuning data and the effect of averaging tuning weights of diff...

متن کامل

TÜBİTAK - BİLGEM German - English Machine Translation Systems for WMT ’ 13

This paper describes TÜBİTAK-BİLGEM statistical machine translation (SMT) systems submitted to the Eighth Workshop on Statistical Machine Translation (WMT) shared translation task for German-English language pair in both directions. We implement phrase-based SMT systems with standard parameters. We present the results of using a big tuning data and the effect of averaging tuning weights of diff...

متن کامل

Semantically-Informed Syntactic Machine Translation: A Tree-Grafting Approach

We describe a unified and coherent syntactic framework for supporting a semanticallyinformed syntactic approach to statistical machine translation. Semantically enriched syntactic tags assigned to the target-language training texts improved translation quality. The resulting system significantly outperformed a linguistically naive baseline model (Hiero), and reached the highest scores yet repor...

متن کامل

A Hybrid Machine Translation System Based on a Monotone Decoder

In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012